rank | frequency | n-gram |
---|---|---|
1 | 161474 | -n |
2 | 127277 | -a |
3 | 117991 | -k |
4 | 84539 | -o |
5 | 38669 | -i |
rank | frequency | n-gram |
---|---|---|
1 | 83146 | -en |
2 | 57444 | -ko |
3 | 54760 | -ak |
4 | 42305 | -an |
5 | 28803 | -ik |
rank | frequency | n-gram |
---|---|---|
1 | 45271 | -ren |
2 | 22811 | -ako |
3 | 20383 | -kin |
4 | 17553 | -tik |
5 | 13718 | -tan |
rank | frequency | n-gram |
---|---|---|
1 | 31398 | -aren |
2 | 19940 | -ekin |
3 | 11075 | -etan |
4 | 9989 | -atik |
5 | 7867 | -tako |
rank | frequency | n-gram |
---|---|---|
1 | 12541 | -rekin |
2 | 6686 | -oaren |
3 | 4822 | -etako |
4 | 4640 | -gatik |
5 | 4305 | -iaren |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings